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METHOD AND APPARATUS FOR FAULT CLASSIFICATION BASED 

ON RESIDUAL VECTORS 

BACKGROUND OF THE INVENTION 

5 

1. FIELD OF THE INVENTION 

This invention relates generally to the field of fault classification in an industrial 
process and, more particularly, to a method and apparatus for fault classification based on 
residual vectors. 

10 2. DESCRIPTION OF THE RELATED ART 

There is a constant drive within the semiconductor industry to increase the quality, 
reliability and throughput of integrated circuit devices, e.g., microprocessors, memory 
devices, and the like. This drive is fueled by consumer demands for higher quality computers 
and electronic devices that operate more reliably. These demands have resulted in a 
15 continual improvement in the manufacture of semiconductor devices, e.g., transistors, as well 
as in the manufacture of integrated circuit devices incorporating such transistors. 
Additionally, reducing the defects in the manufacture of the components of a typical 
transistor also lowers the overall cost per transistor as well as the cost of integrated circuit 
devices incorporating such transistors. 

20 The technologies underlying semiconductor processing tools have attracted increased 

attention over the last several years, resulting in substantial refinements. However, despite 
the advances made in this area, many of the processing tools that are currently commercially 
available suffer certain deficiencies. In particular, such tools often lack advanced process 
data monitoring capabilities, such as the ability to provide historical parametric data in a 



Page 2 of 26 



2000.110700/DIR 
TT5511 



user-friendly format, as well as event logging, real-time graphical display of both current 
processing parameters and the processing parameters of the entire run, and remote, local 
site and worldwide, monitoring. These deficiencies can engender nonoptimal control of 
critical processing parameters, such as throughput, accuracy, stability and repeatability, 
5 processing temperatures, mechanical tool parameters, and the like. This variability manifests 
itself as within-run disparities, run-to-run disparities and tool-to-tool disparities that can 
propagate into deviations in product quality and performance, whereas an ideal monitoring 
and diagnostics system for such tools would provide a means of monitoring this variability, 
as well as providing means for optimizing control of critical parameters. 

10 Semiconductor devices are manufactured from wafers of a semiconducting material. 

Layers of materials are added, removed, and/or treated during fabrication to create the 
electrical circuits that make up the device. The fabrication essentially comprises four basic 
operations. Although there are only four basic operations, they can be combined in hundreds 
of different ways, depending upon the particular fabrication process. 

15 The four operations typically used in the manufacture of semiconductor devices are: 



o 



layering, or adding thin layers of various materials to a wafer from which a 



semiconductor device is produced; 



o 



patterning, or removing selected portions of added layers; 



o 



doping, or placing specific amounts of dopants in the wafer surface through 



20 



openings in the added layers; and 



o 



heat treatment, or heating and cooling the materials to produce desired effects in 



the processed wafer. 
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Occasionally, during the fabrication process, one or more process steps are not 
performed as expected on a production wafer. Such conditions may be due to an error in the 
fabrication facility automated work flow system (e.g. 9 a database or control script error), a 
tool failure, or an operator error. If the abnormal process steps occur early during the 
5 fabrication process, it is not uncommon for the faulty wafer to undergo many subsequent 
steps prior to the faulty fabrication being identified. Once a fault is identified further 
processing is often necessary to determine the nature or cause of the fault, unless the fault is 
grossly obvious. This process is typically referred to as fault classification. Fault 
classification may be time consuming and may require significant human intervention. 
10 Improved fault classification increases the response time for correcting defect conditions. 

The present invention is directed to overcoming, or at least reducing the effects of, 
one or more of the problems set forth above. 

SUMMARY OF THE INVENTION 

One aspect of the present invention is seen in a method that includes receiving a 
15 current residual vector. The current residual vector is compared to a plurality of historical 
residual vectors. Each historical residual vector has an associated fault classification code. 
At least one of the historical residual vectors is selected responsive to determining that the 
current residual vector matches at least one of the historical residual vectors. A fault 
condition is classified based on the fault classification code associated with the selected 
20 historical residual vector. 

Another aspect of the present invention is seen in a system including a fault detection 
unit adapted to generate a current residual vector and a fault classification unit. The fault 
classification unit is adapted to receive the current residual vector, compare the current 
residual vector to a plurality of historical residual vectors, each historical residual vector 
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having an associated fault classification code, select at least one of the historical residual 
vectors responsive to determining that the current residual vector matches at least one of the 
historical residual vectors, and classify a fault condition based on the fault classification code 
associated with the selected historical residual vector. 

5 BRIEF DESCRIPTION OF THE DRAWINGS 

The invention may be understood by reference to the following description taken in 
conjunction with the accompanying drawings, in which like reference numerals identify like 
elements, and in which: 

Figure 1 is a simplified block diagram of a manufacturing system in accordance with 
10 one illustrative embodiment of the present invention; 

Figure 2 is a graph illustrating the comparison between a current residual vector and 
historical residual vectors; and 

Figure 3 is a simplified flow diagram of a method for classifying faults based on 
residual vectors in accordance with another illustrative embodiment of the present invention. 

15 While the invention is susceptible to various modifications and alternative forms, 

specific embodiments thereof have been shown by way of example in the drawings and are 
herein described in detail. It should be understood, however, that the description herein of 
specific embodiments is not intended to limit the invention to the particular forms disclosed, 
but on the contrary, the intention is to cover all modifications, equivalents, and alternatives 

20 falling within the spirit and scope of the invention as defined by the appended claims. 
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DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS 

Illustrative embodiments of the invention are described below. In the interest of 
clarity, not all features of an actual implementation are described in this specification. It will 
of course be appreciated that in the development of any such actual embodiment, numerous 
5 implementation-specific decisions must be made to achieve the developers' specific goals, 
such as compliance with system-related and business-related constraints, which will vary 
from one implementation to another. Moreover, it will be appreciated that such a 
development effort might be complex and time-consuming, but would nevertheless be a 
routine undertaking for those of ordinary skill in the art having the benefit of this disclosure. 

10 Referring to Figure 1, a simplified block diagram of an illustrative manufacturing 

system 10 is provided. In the illustrated embodiment, the manufacturing system 10 is 
adapted to fabricate semiconductor devices. Although the invention is described as it may be 
implemented in a semiconductor fabrication facility, the invention is not so limited and may 
be applied to other manufacturing environments. The techniques described herein may be 

15 applied to a variety of workpieces or manufactured items, including, but not limited to, 
microprocessors, memory devices, digital signal processors, application specific integrated 
circuits (ASICs), or other devices. The techniques may also be applied to workpieces or 
manufactured items other than semiconductor devices. 

A network 20 interconnects various components of the manufacturing system 10, 
20 allowing them to exchange information. The illustrative manufacturing system 10 includes a 
plurality of tools 30-80. Each of the tools 30-80 may be coupled to a computer (not shown) 
for interfacing with the network 20. The tools 30-80 are grouped into sets of like tools, as 
denoted by lettered suffixes. For example, the set of tools 30A-30C represent tools of a 
certain type, such as a chemical mechanical planarization tool. A particular wafer or lot of 
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wafers progresses through the tools 30-80 as it is being manufactured, with each tool 30-80 
performing a specific function in the process flow. Exemplary processing tools for a 
semiconductor device fabrication environment include metrology tools, photolithography 
steppers, etch tools, deposition tools, polishing tools, rapid thermal processing tools, 
5 implantation tools, etc. The tools 30-80 are illustrated in a rank and file grouping for 
illustrative purposes only. In an actual implementation, the tools 30-80 may be arranged in 
any physical order or grouping. Additionally, the connections between the tools in a 
particular grouping are meant to represent connections to the network 20, rather than 
interconnections between the tools 30-80. 

10 A manufacturing execution system (MES) server 90 directs the high level operation of 

the manufacturing system 10. The MES server 90 monitors the status of the various entities 
in the manufacturing system 10 (Le. 9 lots, tools 30-80) and controls the flow of articles of 
manufacture (e.g., lots of semiconductor wafers) through the process flow. A database server 
100 is provided for storing data related to the status of the various entities and articles of 

15 manufacture in the process flow. The database server 100 may store information in one or 
more data stores 110. The data may include pre-process and post-process metrology data, 
tool states, lot priorities, etc. 

Portions of the invention and corresponding detailed description are presented in 
terms of software, or algorithms and symbolic representations of operations on data bits 
20 within a computer memory. These descriptions and representations are the ones by which 
those of ordinary skill in the art effectively convey the substance of their work to others of 
ordinary skill in the art. An algorithm, as the term is used here, and as it is used generally, is 
conceived to be a self-consistent sequence of steps leading to a desired result. The steps are 
those requiring physical manipulations of physical quantities. Usually, though not 
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necessarily, these quantities take the form of optical, electrical, or magnetic signals capable of 
being stored, transferred, combined, compared, and otherwise manipulated. It has proven 
convenient at times, principally for reasons of common usage, to refer to these signals as bits, 
values, elements, symbols, characters, terms, numbers, or the like. 

5 It should be borne in mind, however, that all of these and similar terms are to be 

associated with the appropriate physical quantities and are merely convenient labels applied 
to these quantities. Unless specifically stated otherwise, or as is apparent from the 
discussion, terms such as "processing" or "computing" or "calculating" or "determining" or 
"displaying" or the like, refer to the action and processes of a computer system, or similar 
10 electronic computing device, that manipulates and transforms data represented as physical, 
electronic quantities within the computer system's registers and memories into other data 
similarly represented as physical quantities within the computer system memories or registers 
or other such information storage, transmission or display devices. 

The process control server 90 stores information related to the particular tools 30-80 
15 (Le., or sensors (not shown) associated with the tools 30-80) used to process each lot of 
wafers in the data store 110. As metrology data is collected related to the lot, the metrology 
data and a tool identifier indicating the identity of the metrology tool recording the 
measurements is also stored in the data store 110. The metrology data may include feature 
measurements, process layer thicknesses, electrical performance, surface profiles, etc. Data 
20 stored for the tools 30-80 may include chamber pressure, chamber temperature, anneal time, 
implant dose, implant energy, plasma energy, processing time, etc. Data associated with the 
operating recipe settings used by the tool 30-80 during the fabrication process may also be 
stored in the data store 1 10. For example, it may not be possible to measure direct values for 
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some process parameters. These settings may be determined from the operating recipe in lieu 
of actual process data from the tool 30-80. 

The manufacturing system 10 includes a fault detection unit 120 executing on a 
workstation 130 and a fault classification unit 140 executing on a workstation 150. In 
5 general, the fault detection unit 120 identifies fault conditions in the manufacturing system 10 
and the fault classification unit 140 classifies the identified faults based on residual vectors 
associated with the processing of an associated wafer or lot of wafers. The residual vectors 
are compared to a library of historical residual vectors, and fault conditions are classified 
based on the comparison. 

10 The distribution of the processing and data storage functions amongst the different 

computers 90, 100, 130, 150 is generally conducted to provide independence and a central 
information store. Of course, different numbers of computers and different arrangements 
may be used. Moreover, the functions of some units may be combined. For example, the 
fault detection and classification units 120, 140 may be combined into a single unit. 

15 In general, the fault detection unit 120 is a model-based, multivariate fault detection 

analysis engine. The construct and operation of such fault detection tools are known to those 
of ordinary skill in the art. An exemplary commercially available fault detection engine is 
ModelWare™ offered by Triant, Inc. of Nanaimo, British Columbia, Canada Vancouver, 
Canada. The fault detection unit 120 typically predicts values for various characteristics of 

20 the processing tool and/or processed wafer, compares the expected data with actual data 
collected by the tools 30-80 (i.e., or sensors associated with the tools 30-80) and by 
metrology tools that measure electrical or physical characteristics of the processed wafers, 
and identifies defects based on the differences therebetween. For clarity and to prevent 



Page 9 of 26 



2000.110700/DIR 
TT5511 

obscuring the present invention, the fault detection unit 120 is not discussed in greater detail 
herein. 

The fault classification unit 140, either constructs a residual vector or receives the 
residual vector from the fault detection unit 120. A residual vector relates the difference 
5 between the expected values and actual values for various characteristics evaluated by the 
fault detection unit 120 in identifying the fault condition. 

S =S -S (Y\ 

residual actual expected \ / 

The particular parameters included in the residual vector may vary widely depending 
on the nature of the fault detection unit 120, the type of fault being detected and classified, 

10 the data collection capabilities of the tools 30-80 or sensors, and the types of metrology data 
collected. Data in the residual vector may be based on metrology data (e.g., site level 
thicknesses), sensor trace data (pressure, temperature, gas flow rate), summary statistics at 
different aggregations (mean pressure by lot/wafer/recipe/step), or a combination of any of 
the above. The application of the present invention is not limited to any particular selection 

15 of parameters in the residual vector. 

In one example, the residual vector may include temperature, pressure, and mass flow 
rate data for a fabrication process performed by one of the tools 30-80. 
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The fault classification unit 140 uses the residual vector to identify each type of fault 
20 classification. Each fault type typically results in a characteristic response in the values (e.g., 
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tool or metrology data) being used as inputs to the fault detection algorithm. The individual 
components of the residual vector are determined for each fault analysis application. 

Each existing fault classification record represented by a historical residual vector has 
an associated fault classification code. For example, when the cause of a fault condition is 
5 identified (e.g., tool fault X), a fault classification code is associated in the data store 110 
with the residual vector generated when the fault was detected. If the fault classification unit 
140 matches a current residual vector with a historical residual vector, the fault classification 
for the current wafer may be inferred by the fault classification code associated with the 
matched historical residual vector. If the fault classification unit 140 fails to match a current 
10 residual vector with a historical residual vector, a new fault classification code may be 
initiated. Human intervention may be necessary to assign a meaning to the new fault 
classification code after the fault condition is diagnosed. 

In some embodiments, the residual vector may be preprocessed. For example, in 
some cases, the same fault type may be indicated with different degrees of intensity (e.g., a 
15 small shift in temperature or a large shift in temperature). The residual vector may be 
preprocessed by normalizing the vector wherein the length of the vector is adjusted so that it 
equals one. 

S 

O residual CX\ 

^residual \X I KPJ 

I residual | * 

However, in some cases, normalizing may not be desired. For instance, if different 
20 fault types have similar residual vectors, differing only in magnitude, normalizing would not 
be performed. It is contemplated that the fault classification engine may operate multiple 
times on the same residual vector, with or without preprocessing to classify a fault condition. 
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Once the residual vector has been preprocessed (if desired), the fault classification 
unit 140 compares it to other historical vectors/data points that have been stored in a database 
(e.g., the data store 110. In a first embodiment the fault classification unit 140 calculates the 
distance between the ends of the vectors. One technique for calculating the distance is to 
calculate the distance between two points in space using the following equation: 



(4) 



where: 



• D is the distance between the new and stored data points; 

• S represents the data for a vector/data point. SI is the new data point. S2 is 
10 the stored data point; 

• i refers to an index into the sensors (e.g., 1 may mean Temperature, 2 may 
mean Pressure, etc.); and 

• n refers to the number of sensors/values in each vector/data point. 

If the distance between the new vector/data point and the stored vector/data point is 
15 less than a predefined tolerance, then the fault described by the new data point can be 
considered to be of the same classification as the stored data point. The distance technique 
may be used whether or not the residual vector is normalized. 

A second technique for determining the distance between the current residual vector 
and a historical residual vector involves calculating the distance between the two data points 
20 along the surface of the unit structure around the origin (e.g., the distance following the curve 
of the circle/sphere instead of following a chord through the circle/sphere). This technique is 
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employed with normalized residual vectors. Although this technique is adapted for use with 
normalized vectors, either technique should identify the same closest historical residual 
vector. 

The second distance technique is performed by projecting the current residual vector 
onto the historical residual vector to which it is being compared. This projection is done by 
calculating the scalar (dot) product of the vectors. 

P = VS 2 (5) 

where, 

° S r is the new vector; 

° S 2 is the historical residual vector; and 

° P is the projection of the new vector onto the stored vector. 

If the new vector is normalized, the value of P is between -1 and 1 due to the previous 
normalization of the vectors to unit length. A value for P of 1 indicates that vectors S x and 

S 2 are identical, and a value of -1 means that they are exactly opposite. In this embodiment, 
a match threshold may be set at 0.9 for example. 

If the new vector is not normalized, there are no bounds on the value of P. This 
situation may be addressed by scaling P by the magnitude of the larger vector. 

max(|£|,|s 2 |) 
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In this case, the value of P is still unbounded, but a perfect match of the current 
residual vector to the historical residual vector is now scaled to a value of one. 

Figure 2 is a graph illustrating the comparison between a current residual vector 200 
and historical residual vectors 210-250. In the example of Figure 2, the current residual 
5 vector 200 includes a temperature parameter and a pressure parameter. The fault 
classification unit 140 compares the current residual vector 200 to all historical residual 
vectors 210-250 of the appropriate type (e.g., same tool, recipe, etc.). The circles T represent 
the detection threshold (e.g., 0.9) and the values P represent the projection values (i.e., 
distance from origin of unit circle forth projection). If P lies within the threshold T for a 

10 given historical residual vector 210-250 a match is identified. The fault classification unit 
140 identifies the historical residual vector 240 as the closest to the current residual vector 
200 and identifies the fault classification associated with the vector 240 as the most likely 
fault classification for the current residual vector 200. In the illustrated example, the 
historical residual vector 230 is also within the predetermined matching threshold, but not as 

15 close to the current residual vector 200 as the vector 240. The fault classification unit 140 
identifies the historical residual vector 230 as a possible fault classification match. Hence, 
the fault classification unit 140 identifies a most likely fault classification and possible one or 
more possible fault classifications for the current residual vector 200 based on the 
comparisons with the historical residual vectors 210-250. 

20 A third technique for determining the distance between a current residual vector 200 

and a historical residual vector 210-250 is determining, an angle, A, between the current 
residual vector 200 and the historical residual vectors 210-250. If the angular distance, 
defined by the angle, A, is less than a predetermined threshold, a match condition may be 
identified. 
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If the current residual vector 200 is not within the comparison limits of any historical 
residual vector 210-250 stored previously, the current residual vector 200 may be used to 
define a new classification of fault. The value of the comparison limit may be generated 
through manual or automatic analysis of historical data, real-time data, or a manual threshold. 

If the current residual vector 200 matches one of the historical residual vectors 210- 
250, the historical residual vector 240) that describes the fault classification type may 
be updated. The new historical residual vector may be normalized prior to being stored in the 
data store 110, as described above. In some embodiments, prior to updating the historical 
residual vector, manual input indicating that the current residual vector does indeed match the 
fault classification associated with the historical residual vector may be provided. When a 
new residual vector is determined to be of the same type as a historical residual vector, the 
historical residual vector may be updated by adding the current residual vector 200 as part of 
a weighted average: 



new 




(7) 



where, 



o 



Snew represents the new historical residual vector; 



o 



$oid represents the old historical residual vector; 



o 



S represents the current residual vector 200 being used to update the 



historical residual vector; and 



o 



n is the number of data points that have contributed to old historical residual 



vector. 



Page 15 of 26 



2000.110700/DIR 
TT5S11 

Alternatively, the historical residual vector may be updated using an exponentially 
weighted moving average (EWMA). 

^ = ^+(1-^)5 (8) 

where X is the EWMA weighting factor having a value between 0 and 1. 

5 In some embodiments, the fault classification unit 140 may not update the historical 

residual vector based on the current residual vector. For example, the historical residual 
vectors may be generated through known actions, such as experiment intentionally run to 
induce fault conditions and record their response. In such cases, the historical residual 
vectors may not be updated. 

10 The fault classification unit 140 may employ two types of data records - classification 

model records and fault type records. The classification model record stores information 
about the group of stored historical residual vectors for a particular process context (e.g., tool 
recipe). This record includes a descriptor that may be used to ensure that vectors being 
compared to stored vectors contain the same elements as the stored vectors. Also, 

15 information as to whether or not the stored vectors are normalized, whether they should be 
updated, etc. may be stored in this record. The fault records store information about an 
individual fault class and are used to communicate results. Examples of the type of 
information that might appear in the fault type record would be a list of fault occurrences as 
well information about how the faults were remedied. 

20 In another embodiment, the fault classification unit 140 may be adapted to predict 

fault conditions prior to them being detected by the fault detection unit 120. In this 
embodiment, the fault detection unit 120 provides the fault classification unit 140 with 
residual vectors that have not been identified as representing a fault condition. The fault 
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classification unit 140 matches the residual vectors to the historical residual vectors to 
determine if a trend exists whereby they are getting closer to one or more of the historical 
residual vectors. By identifying such trends, the fault classification unit 140 may predict a 
fault condition prior to it being identified by the fault detection unit 120. The fault 
5 classification unit 140 may identify one or more potential fault conditions based on which 
historical residual vectors the trend seems to be approaching. Predicting the fault conditions 
may allow a corrective action to be implemented prior to the production of faulty devices that 
require rework or are scrapped. 

Turning now to Figure 3, a simplified flow diagram of a method for classifying faults 
10 based on residual vectors in accordance with another illustrative embodiment of the present 
invention is shown. In block 300, a current residual vector is received. In block 310, the 
current residual vector is compared to a plurality of historical residual vectors. Each 
historical residual vector has an associated fault classification code. In block 320, at least one 
of the historical residual vectors is selected responsive to determining that the current residual 
15 vector matches at least one of the historical residual vectors. In block 330, the fault condition 
is classified based on the fault classification code associated with the selected historical 
residual vector. 

The particular embodiments disclosed above are illustrative only, as the invention 
may be modified and practiced in different but equivalent manners apparent to those skilled 
20 in the art having the benefit of the teachings herein. Furthermore, no limitations are intended 
to the details of construction or design herein shown, other than as described in the claims 
below. It is therefore evident that the particular embodiments disclosed above may be altered 
or modified and all such variations are considered within the scope and spirit of the invention. 
Accordingly, the protection sought herein is as set forth in the claims below. 
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